Skip to content

Native malloc fixes and cleanup#550

Merged
jbachorik merged 2 commits into
mainfrom
fix/malloc-tracer-double-accounting
Jun 8, 2026
Merged

Native malloc fixes and cleanup#550
jbachorik merged 2 commits into
mainfrom
fix/malloc-tracer-double-accounting

Conversation

@jbachorik

@jbachorik jbachorik commented May 27, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?:
Fixes reentrancy double-counting in the native malloc tracer, cleans up hook dispatch for musl, and removes dead code.

  • Add CriticalSection guard in maybeRecord to prevent profiler-internal allocations (e.g. JFR buffer growth) from re-entering the hook and being double-counted
  • Replace calloc_hook_dummy / posix_memalign_hook_dummy pass-through hooks with a simple skip: on musl where calloc delegates to malloc and posix_memalign delegates to aligned_alloc internally, we leave those GOT entries unpatched and let the inner hook record the allocation
  • Remove _orig_free / SAVE_IMPORT(free) — free is not tracked for liveness; the resolved pointer was only used for a probe cleanup where plain free() is correct
  • Remove the _calloc_hook_fn / _posix_memalign_hook_fn pre-computed pointer members and the NO_OPTIMIZE macro that existed solely for the dummy hooks
  • Gate MallocTracer::installHooks() in Libraries::refresh() on MallocTracer::running() to avoid unnecessary GOT patching work between profiling sessions

Motivation:
Profiler-internal allocations triggered inside recordMalloc (e.g. sample buffer allocation) re-enter the patched GOT and were being silently double-counted. The dummy hook approach for musl added function-call overhead on every calloc/posix_memalign without benefit.

Additional Notes:
detectNestedMalloc() is kept — it still drives the decision of whether to skip patching calloc/posix_memalign on musl.

How to test the change?:
Covered by existing integration tests:

  • NativememProfilerTest — verifies profiler.Malloc JFR events have positive size, non-zero address, and Java stack trace
  • NativememSampledProfilerTest — verifies weight and sample count from repeated allocations

For Datadog employees:

  • This PR doesn't touch any of that.
  • JIRA: PROF-14807

Unsure? Have a question? Request a review!

@jbachorik jbachorik marked this pull request as ready for review May 27, 2026 20:53
@jbachorik jbachorik requested a review from a team as a code owner May 27, 2026 20:53
@datadog-prod-us1-3

datadog-prod-us1-3 Bot commented May 27, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 4 Pipeline jobs failed

DataDog/java-profiler | gtest-asan-arm64   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). LeakSanitizer detected memory leaks in Arguments::parse(char const*) at /go/src/github.com/DataDog/java-profiler/ddprof-lib/src/main/cpp/arguments.cpp:119:18

DataDog/java-profiler | gtest-tsan-amd64   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Segmentation fault while executing the binary with GTEST_DEATH_TEST_STYLE=threadsafe.

DataDog/java-profiler | gtest-tsan-arm64   View in Datadog   GitLab

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Containers with unready status: [build helper docker].

View all 4 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 7f7ae89 | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts

dd-octo-sts Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #26542176878 | Commit: 613cb9b | Duration: 12m 13s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-05-27 22:37:23 UTC

…cking

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik requested a review from zhengyu123 May 29, 2026 06:18

@rkennke rkennke left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I assume the added overhead of the CS in maybeRecord() is acceptable? E.g. malloc-heavy workloads?

@jbachorik

Copy link
Copy Markdown
Collaborator Author

I assume the added overhead of the CS in maybeRecord() is acceptable? E.g. malloc-heavy workloads?

I would lean towards 'yes' - but I don't have any benchmarks for that :(
For now, the feature is a kind of experimental and opt-in so we can easily reevaluate later, before making it a publicly supported feature, based on proper benchmarks.

@jbachorik jbachorik merged commit 0f996e8 into main Jun 8, 2026
99 checks passed
@jbachorik jbachorik deleted the fix/malloc-tracer-double-accounting branch June 8, 2026 14:29
@github-actions github-actions Bot added this to the 1.45.0 milestone Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants